Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Geochemistry is a data-driven discipline. Modern laboratories produce highly diverse data, and the recent exponential increase in data volumes is challenging established practices and capabilities for organizing, analyzing, preserving, and accessing these data. At the same time, sophisticated computational techniques, including machine learning, are increasingly applied to geochemical research questions, which require easy access to large volumes of high-quality, well-organized, and standardized data. Data management has been important since the beginning of geochemistry but has recently become a necessity for the discipline to thrive in the age of digitalization and artificial intelligence. This paper summarizes the landscape of geochemical databases, distinguishing different types of data systems based on their purpose, and their evolution in a historic context. We apply the life cycle model of geochemical data; explain the relevance of current standards, practices, and policies that determine the design of modern geochemical databases and data management; the ethics of data reuse such as data ownership, data attribution, and data citation; and finally create a vision for the future of geochemical databases: data being born digital, connected to agreed community standards, and contributing to global democratization of geochemical data.more » « less
-
Afzaal, Muhammad (Ed.)Environmental challenges are rarely confined to national, disciplinary, or linguistic domains. Convergent solutions require international collaboration and equitable access to new technologies and practices. The ability of international, multidisciplinary and multilingual research teams to work effectively can be challenging. A major impediment to innovation in diverse teams often stems from different understandings of the terminology used. These can vary greatly according to the cultural and disciplinary backgrounds of the team members. In this paper we take an empirical approach to examine sources of terminological confusion and their effect in a technically innovative, multidisciplinary, multinational, and multilingual research project, adhering to Open Science principles. We use guided reflection of participant experience in two contrasting teams—one applying Deep Learning (Artificial Intelligence) techniques, the other developing guidance for Open Science practices—to identify and classify the terminological obstacles encountered and reflect on their impact. Several types of terminological incongruities were identified, including fuzziness in language, disciplinary differences and multiple terms for a single meaning. A novel or technical term did not always exist in all domains, or if known, was not fully understood or adopted. Practical matters of international data collection and comparison included an unanticipated need to incorporate different types of data labels from country to country, authority to authority. Sometimes these incongruities could be solved quickly, sometimes they stopped the workflow. Active collaboration and mutual trust across the team enhanced workflows, as incompatibilities were resolved more speedily than otherwise. Based on the research experience described in this paper, we make six recommendations accompanied by suggestions for their implementation to improve the success of similar multinational, multilingual and multidisciplinary projects. These recommendations are conceptual drawing on a singular experience and remain to be sources for discussion and testing by others embarking on their research journey.more » « lessFree, publicly-accessible full text available December 5, 2025
-
Physical samples and their associated (meta)data underpin scientific discoveries across disciplines, and can enable new science when appropriately archived. However, there are significant gaps in community practices and infrastructure that currently prevent accurate provenance tracking, reproducibility, and attribution. For the vast majority of samples, descriptive metadata is often sparse, inaccessible, or absent. Samples and associated (meta)data may also be scattered across numerous physical collections, data repositories, laboratories, data files, and papers with no clear linkages or provenance tracking as new information is generated over time. The Physical Samples Curation Cluster has therefore developed ‘A Scientific Author Guide for Publishing Open Research Using Physical Samples.’ This involved synthesizing existing practices, community feedback, and assessing real-world examples to identify community and infrastructure needs. We identified areas of work needed to enable authors to efficiently reference samples and related data, link related samples and data, and track their use. Our goal is to help improve the discoverability, interoperability, use of physical samples and associated (meta)data into the future.more » « less
-
There is growing recognition that unambiguous citation and tracking of physical samples allows previously impossible linking of samples to data and publications, linking and integration of sample-based observations across data systems, and paves the road towards advanced data mining of sample-based data. And in recent years, there has been an uptake in the use of Persistent Identifiers (PIDs) for physical samples to support such citation and tracking. The IGSN (International Geo Sample Number) is a PID for physical samples. It was originally developed for the solid earth sciences, and has evolved into an international PID system with members in five continents and a network of active allocating agents. It has been adopted by a growing number and range of stakeholders worldwide, including national geological surveys, research infrastructure providers, collection curators, researchers, and data managers, and by other disciplines that need to refer to physical samples. Nearly 6.9 million samples have been registered with IGSNs so far. The IGSN system uses the Handle System (Kahn and Wilensky 1995; see also Handle.Net ® ) and has an international organization, IGSN e.V., to manage its governance structure and the technical architecture. The recent expansion of the IGSN beyond the geosciences into other domains such as biodiversity, archeology, and material sciences confirms the power of its concept and implementation, but imposes substantial pressures on the existing capacity and capabilities of the IGSN architecture and its governing organization. Modifications to the IGSN organizational and technical architecture are necessary at this point to keep pace with the growing demand and expectations. These changes are also necessary to ensure trustworthy and sustainable services for PID registration and resolution in a maturing research data ecosystem. The essential criteria for a trustworthy system include an organizational foundation that ensures longevity, sustainability, proper governance, and regular quality assessment of registration services. It also includes a reliable and secure technical platform, based on open standards, which is sufficiently scalable and flexible to accommodate the growing diversity of specimen types, use cases, and stakeholder requirements. In 2018, a major planning project for the IGSN was funded by the Alfred P. Sloan Foundation. An international group of experts participates in re-designing and improving the existing organization and technical architecture of the IGSN system, revising the current business model of the IGSN e.V. and professionalizing its operations. The goal is for the IGSN system to be able to respond to, and support in a sustainable manner, the rapidly growing demands of a global and increasingly multi-disciplinary user community, and to ensure that the IGSN will be a trustworthy, stable, and adaptable persistent identifier system for material samples, both technically and organizationally. The end result should also satisfy and facilitate participation across research domains, and will be a reliable component of the evolving research data ecosystem. Finally, it will ensure that the IGSN is recognized as a trusted partner by data infrastructure providers and the science community alike.more » « less
An official website of the United States government
